A Text Mining Library for Biodiversity Literature in Spanish
نویسندگان
چکیده
Biodiversity represents a great ecological, economic and aesthetic heritage to the world. Most of the knowledge about this heritage could be found in thousands of documents that describe valuable information obtained over centuries. Projects which try to gather and structure all this information, even for very specific topics, may take years. In addition to this, keeping a project updated is difficult because new knowledge is continuously being published. Therefore, there is a necessity to use automatic methods to extract relevant information efficiently. In this article we describe the first stage of a software project, that aims to build a complete library to apply Natural Language Processing techniques on documents about biodiversity in Spanish.
منابع مشابه
A Text Mining Framework for Accelerating the Semantic Curation of Literature
The Biodiversity Heritage Library is the world’s largest digital library of biodiversity literature. Currently containing almost 40 million pages, the library can be explored with a search interface employing keyword-matching, which unfortunately fails to address issues brought about by ambiguity. Helping alleviate these issues are tools that automatically attach semantic metadata to documents,...
متن کاملConstruction of a Biodiversity Knowledge Repository using a Text Mining-based Framework
In our aim to make the information encapsulated by biodiversity literature more accessible and searchable, we have developed a text mining-based framework for automatically transforming text into a structured knowledge repository. A text mining workflow employing information extraction techniques, i.e., named entity recognition and relation extraction, was implemented in the Argo platform and w...
متن کاملA review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملConstructing a biodiversity terminological inventory
The increasing growth of literature in biodiversity presents challenges to users who need to discover pertinent information in an efficient and timely manner. In response, text mining techniques offer solutions by facilitating the automated discovery of knowledge from large textual data. An important step in text mining is the recognition of concepts via their linguistic realisation, i.e., term...
متن کاملCollaborating on Open Science: The Journey of the Biodiversity Heritage Library
The Biodiversity Heritage Library, BHL,1 is an established and successful digital library, formed by a global consortium of natural history libraries, with engaged and enthusiastic users. The extensive partnerships, curated content, innovative tools and services, the ease of mining the data all combine to establish an open science resource that advances scientific progress through linking, use ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Comput. Linguistics Appl.
دوره 6 شماره
صفحات -
تاریخ انتشار 2015